1 HDFS

2 Characteristics of HDFS

3 Parallel Computing Paradigms

4 Areas where HDFS is a good fit

5 Areas where HDFS is not a good fit today

6 Some Basic HDFS Concepts

6.1 The Underlying Filesystem Options

  • ext3 (third extended file system) Commonly used in HDFS
    • Used by Yahoo
    • Released in 2001
    • Up to 32 TB of file size
  • ext4 (fourth extended file system) Commonly used in HDFS
    • Used by Google
    • Released in 2008
    • Fast like XFS
    • Up to 1 EB (exabytes)
  • XFS
    • Released in 1993 by Silicon Graphics
    • Fast (parallel I/O)
    • Some drawbacks (cannot be shrunk; slow metadata processing)

6.2 Blocks

  • Disk blocks: The minimum amount of data that it can read or write, which is normally 512 bytes
  • Filesystem blocks (on a single disk): An integral multiple of the disk block size, typically a few kilobytes in size
  • HDFS blocks (abstraction): Its size is a much larger unit - 64 MB or 128 MB by default. Files in HDFS are broken into block-sized chunks, stored as independent units. Unlike a filesystem for a single disk, a file in HDFS that is smaller than a single block does not occupy a full block’s worth of underlying storage

6.2.1 Why Is a Block in HDFS So Large?

  • To minimize the cost of seeks. A large block size like this (128 or 64 MB) can make the time to transfer the data from the disk be significantly greater than the time to seek to the start of the block. Thus, the data transfer can operate at near the disk transfer rate
    • Example: If the seek time is 10 ms and the transfer rate is 100 MB/s, to make the seek time 1% of the transfer time, the block size should be around 100 MB
  • The map tasks in MapReduce operate on one block at a time, so if you have too few tasks (fewer than nodes in the cluster), your jobs will run slower than they could otherwise

6.2.2 Benefits of block abstraction for an HDFS

  • A file can be larger than any single disk in the network
  • Making the unit of abstraction a block simplifies the storage subsystem (it is easy to calculate how many blocks can be stored on a given disk)
  • Blocks fit well with replication for providing fault tolerance and availability. Each block is replicated to a small number (typically 3) of physically separate machines. If a block becomes unavailable, a copy can be read from another location in a way that is transparent to the client. A block that is no longer available due to failure can be replicated from its alternative locations to other live machines to bring the replication factor back to the normal level
  • The fsck command in HDFS understands blocks. For example, running: hdfs fsck / -files -blocks will list the blocks of all the files in the filesystem

6.2.3 A Hadoop Cluster

  • A Hadoop cluster is composed of a Name Node and a cluster of Data Nodes

6.3 Name Nodes and Data Nodes

  • The name node manages the filesystem namespace. It maintains the filesystem tree and the metadata (held in memory). This information is stored persistently on the local disk: the namespace image and the edit log. The name node knows the locations of all the blocks of a given file. It is reconstructed from data nodes when the system starts
  • Data nodes are the workhorses of the filesystem. They store and retrieve blocks when they are told to (by clients or the name node), and they report back to the name node periodically with lists of blocks that they are storing

6.3.1 The Importance of the Name Node

Without the name node, the filesystem has no use (a single point of failure, or SPOF). It is critical to make this node resilient to failure. There are two mechanisms

  • Back up the files that make up the persistent state of the filesystem metadata. The name node writes its persistent state to multiple file systems, the local disk as well as a remote NFS mount
  • Run a secondary name node. Despite its name, in Hadoop 1 it did not act as a name node (just a checkpoint) but periodically merges the namespace image with the edit log, which can be used to rebuild a new name node if a total failure of name node happens

6.4 HDFS High-Availability

  • Hadoop 2 implements a pair of name nodes in an activestandby configuration as a standard operational procedure. If the active name node does fail, the standby can take over quickly (in a few tens of seconds)
  • Data nodes send block reports to both name nodes since the block mapping is stored in the memory of name nodes
  • Only one name node is active at a time
  • Each name node runs a failover controller process to monitor its health by using the heartbeating mechanism
  • A fencing mechanism is used to prevent a failing primary name node from damaging the cluster by killing all its processes or even “shooting the other node in the head” (cut the power)

7 The Name Node Holds the Metadata

8 The Name Node Has an Embedded Web Server

It shows some statistics of the filesystem:

9 An Example of Name Node Statistics


10 A Diagram of a HDFS Cluster

11 Network Distance in an HDFS Cluster

Use the bandwidth between two nodes as a measure of distance:

12 Name Node is Rack Aware

13 Basic Filesystem Operations

14 Some Common Hadoop Commands

15 Create an HDFS directory at System Root

[root@sandbox ~]# hadoop fs -mkdir /stsci5065
[root@sandbox ~]# hadoop fs -mkdir /stsci5065/data
[root@sandbox ~]# find /stsci5065/data
find: '/stsci5065/data': No such file or directory
[root@sandbox ~]# ls /stsci5065
ls: cannot access /stsci5065: No such file or directory

16 The New Directory is Visible in HDFS

[root@sandbox ~]# hadoop fs -ls /

17 Download Some Data Files from Internet

wget http://www.gutenberg.org/files/100/100-0.txt
wget ftp://ftp.ncbi.nih.gov/genomes/INFLUENZA/influenza.cds
wget http://www.gutenberg.org/files/74/74-0.txt

Note:

18 File Permissions in HDFS

19 Load a Data File into HDFS with Commands

hadoop fs -copyFromLocal pg100.txt /stsci5065/data
hadoop fs -put tom_sawyer.txt /stsci5065/data
hadoop fs -put influenza.cds /stsci5065/data

Note: You can load multiple files at once with one command

20 List the Files Just Loaded

[root@sandbox /]# hadoop fs -ls /stsci5065/data

Results:

[root@sandbox ~]# hadoop fs -ls /stsci5065/data
Found 3 items
-rw-w--r--   1 root hdfs    5856576 2018-02-15 23:20 /stsci5065/data/100-0.txt
-rw-w--r--   1 root hdfs 1082513347 2018-02-15 23:20 /stsci5065/data/influenza.cds
-rw-w--r--   1 root hdfs     433025 2018-02-15 23:20 /stsci5065/data/tom_sawyer.txt

20.1 Check the Name Node Browser at port 50070

20.2 The Contents of /stsci5065/data in Data Node Browser

20.3 File information of pg100.txt

20.4 File information of influenza.cds

20.5 File information of influenza.cds, the 5th Block

20.6 The Sizes of the File Blocks

  • Go with the default block size 128 MB/block.
  • File size = 1.01 GB (influenza.cds)
    • Block 1, 128 MB
    • Block 2, 128 MB
    • Block 3, 128 MB
    • Block 4, 128 MB
    • Block 5, 128 MB
    • Block 6, 128 MB
    • Block 7, 128 MB
    • Block 8, 8 MB
  • The last HDFS block only takes the storage of the actual size (8 MB), not the default block size

20.7 Can Also run a Command to See the # of Blocks

hadoop fsck /stsci5065/data -files -blocks | less


20.8 Copy a File from HDFS to Local Filesystem

  • hadoop fs -copyToLocal /stsci5065/data/100-0.txt 100-0.copy.txt: (copy it to the current directory, in this case /root)
  • hadoop fs -copyToLocal /stsci5065/data/tom_sawyer t_s.copy.txt: (copy it to the system root)

20.9 Compare t_s.copy.txt with tom_sawyer.txt

  • openssl dgst -md5 /t_s.copy.txt ~/tom_sawyer.txt: (the same output values of the command indicate these two files are the same, i.e., the file survived to HDFS and is back intact)

21 Hadoop Filesystems

22 Interfaces to Hadoop’s Filesystems: HTTP

All Hadoop filesystem interactions are mediated through the Java API

23 A Client Reading Data from HDFS

24 FUSE: Filesystem in Userspace

25 HDFS 2.x Features